TR-2004017: Towards a Formal Concept Analysis Approach to Exploring Communities on the World Wide Web
نویسندگان
چکیده
An interesting problem associated with the World Wide Web (Web) is the definition and delineation of so called Web communities. The Web can be characterized as a directed graph whose nodes represent Web pages and whose edges represent hyperlinks. An authority is a page that is linked to by high quality hubs, while a hub is a page that links to high quality authorities. A Web community is a highly interconnected aggregate of hubs and authorities. We define a community core to be a maximally connected bipartite subgraph of the Web graph. We observe that a web subgraph can be viewed as a formal context and that web communities can be modeled by formal concepts. Additionally, the notions of hub and authority are captured by the extent and intent, respectively, of a concept. Though Formal Concept Analysis (FCA) has previously been applied to the Web, none of the FCA based approaches that we are aware of consider the link structure of the Web pages. We utilize notions from FCA to explore the community structure of the Web graph. We discuss the problem of utilizing this structure to locate and organize communities in the form of a knowledge base built from the resulting concept lattice and discuss methods to reduce the complexity of the knowledge base by coalescing similar Web communities. We present preliminary experimental results obtained from real Web data that demonstrate the usefulness of FCA for improving Web search.
منابع مشابه
LNAI 3403 - Towards a Formal Concept Analysis Approach to Exploring Communities on the World Wide Web
An interesting problem associated with the World Wide Web (Web) is the definition and delineation of so calledWeb communities. The Web can be characterized as a directed graph whose nodes represent Web pages and whose edges represent hyperlinks. An authority is a page that is linked to by high quality hubs, while a hub is a page that links to high quality authorities. A Web community is a highl...
متن کاملTowards a Formal Concept Analysis Approach to Exploring Communities on the World Wide Web
An interesting problem associated with the World Wide Web (Web) is the definition and delineation of so called Web communities. The Web can be characterized as a directed graph whose nodes represent Web pages and whose edges represent hyperlinks. An authority is a page that is linked to by high quality hubs, while a hub is a page that links to high quality authorities. A Web community is a high...
متن کاملPrioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملQuery-Driven Conceptual Browsing: A Semi-Automated Approach for Building and Exploring Concepts on the Web
The presence of communities, which are groups of highly cross referenced pages together representing a single concept, is a striking feature of the World Wide Web. Quite often a group of communities, each topically coherent within itself, may be related through a common concept manifested in each of them. Motivated by this observation, we present a method for query-driven conceptual browsing fo...
متن کاملLearning Adaptive Domain Models from Click Data to Bootstrap Interactive Web Search
Today, searchers exploring the World Wide Web have come to expect enhanced search interfaces – query completion and related searches have become standard. Here we propose a Formal Concept Analysis lattice as an underlying domain model to provide a source of query refinements. The initial lattice is constructed using NLP. User clicks on documents, seen as implicit user feedback, are harnessed to...
متن کامل